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1. INTRODUCTION 

The study and comprehension of video sequences have become an active area of research in computer 
vision over the past few years due to their increasing importance in numerous video analysis applications, such 
as video surveillance and multimedia applications that detect motion in the video scene [I], [2]. Identifying 
motion in a video sequence is important in target detection [3]-[[5] and behaviour interpretation [6], [[7]. Thus, 
the initial operation is distinguishing between the foreground and background objects. It can be done in various 
ways, depending on the available data and whether the object is moving. Any prior information to detect 
moving objects from a video sequence is unnecessary. Instead, a series of consecutive frames is required. 

Many motion-based video systems still struggle to deal with dynamically changing backgrounds. Dy- 
namic or moving background objects can cause massive false detections [8]. It is a significant contributor to 
false alarms in event detection. In order to raise the sensitivity to important events of interest, the security indus- 
try has a tremendous need to either detect and suppress these false alarms or mitigate the effects of background 
changes. 

Many new approaches for identifying motion have received attention. In the process of detecting 
motion, the optical flow method [9]-{12], the inter-frame difference method [13]—(16], and the background 
subtraction method [17]-[20] are the most used approaches. Detecting methods are chosen based on detection 
scenarios. The inter-frame difference method extracts a moving target from a continuous video frame image. 
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It is pretty adaptive. However, this approach causes a cavity effect, reducing target detection accuracy. The 
optical flow approach uses the target image’s brightness. This technology is rarely employed because of its 
processing complexity and lack of anti-jamming capacity. Meanwhile, background subtraction focuses on 
building a stable background model to detect motion. 

Subtracting the background from a video sequence is one of the most straightforward approaches to 
finding the motion inside the video. It is an essential procedure for the majority of computer vision applica- 
tions. In its most basic form, a background subtraction method requires a consistent background, which is 
an extremely challenging requirement for applications that run in real time. The video sequence is separated 
into its component video frames, and then each video frame is removed from a background model or reference 
image. It is assumed that the moving object is represented by the current active structure’s pixels distinct from 
the background model. The foreground object goes through additional processing for object localization and 
tracking. Because removing the background is the initial step in many applications involving computer vision, 
it is essential that the obtained foreground pixels precisely correlate to the moving object of interest. 

Background subtraction is a method that uses a fairly straightforward algorithm. However, it is ex- 
tremely unstable to shifts in the surrounding environment and has weak anti-interference capabilities. Many 
researchers have presented a variety of background subtraction approaches to deal with a variety of problems. 
Different approaches to simulating the background can be divided into pixel-based, region-based, and hybrid 
approaches. In addition, there is the possibility of classifying modelling techniques for the background as either 
parametric or nonparametric. Basic models [21], background estimation [22], background clustering [23], sub- 
space learning, kernel density estimation, and the Gaussian mixture model (GMM) are some video background 
modelling methodologies that are often utilized in the video. Among these, the GMM technique has gained 
much interest due to its ability to easily handle image or video noise, shadow, camouflage, slow-moving ob- 
ject, multimodal background and illumination changes, which is particularly noteworthy. However, researchers 
are still investigating and making new contributions to the established study to increase the object-detection 
performance. The purpose of this work is to provide a summary of researches that has successfully done on 
GMM.-based object detection in relation to various background environments. The other parts of this paper are 
structured as follows. In section 2, the GMM is explained and introduced. In section 3, we comprehensively 
summarise the research on the various GMM versions with respect to various background conditions. Section 
4 presents our conclusions. 


2. GAUSSIAN MIXTURE MODEL 

GMM was first introduced by [24], which is a form of modelling technique based on the background 
that creates a robust tracking system to handle multiple moving objects, variations in lighting, moving scene 
clutter, and other arbitrary changes to the scene. The fundamental concept behind this method is to create a 
Gaussian distribution for every pixel contained within a frames-long series. The foreground and background 
are each indicated by a different weight, with the foreground having a smaller weight than the background. If 
the new pixel fits the parameters of the A’ Gaussian model, then it will be treated as a background pixel. If there 
is no match for the K Gaussian model, the new pixel will be handled as though it were a foreground pixel. The 
random variable X is assigned to each pixel and will be modelled as a mixture of A Gaussian distributions, as 
shown in (1). 
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for computational reasons and K is also determined by the computational power and the available memory. 
Meanwhile, the value of pixels can be modelled by a Gaussian mixture distribution value of K from 3 to 5. 
These methods assume that the pixel values with red, green, and blue (RGB) have the same variance and are 
independent. This assumption can avoid a costly matrix inversion at some accuracy rate. As a result, the GMM 
is applied in this research in order to characterize the arrangement of pixels found in the scene. 

The background model estimation process starts when every pixel value is compared to the current 
Gaussian distribution value / until a match is found. A pixel value within 2.5 standard deviations of the 
distribution is considered a match. If the match distribution found for the new pixel value is one of the back- 
ground models, it is regarded as the background. Otherwise, the value of the pixel is the foreground. The B 
Gaussian distribution, which is chosen as the background model and exceeds a certain threshold which can be 
written as (4) 


k=1 
B=argmin (~ Wr > Pires) ; (4) 
b 


The Gaussian parameters such as the weight, the mean, and the variance must be updated for the 
subsequent foreground detection. The weight, the mean, and the variance of the k“” Gaussian in the mixture at 
the time ¢ are updated as follows 


Wat = (1 — a)Wet-1 + a( Ne, t) (5) 
bt = (1 — p)ue—-1 + (Xz) (6) 
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where aq is the learning rate and p is a second learning rate which is equal to an(X;, uz,0%) . If the model 
matched, the value of Nz, is 1 and 0 for the remaining models. If the model is unmatched, the mean and the 
variance remain unchanged. After making this first approximation, the weights are subsequently renormalized. 
Suppose the current pixel does not match with the k*” Gaussian in the mixture. In that case, the distribution 
with the most negligible probability is replaced with a distribution that has a current value as its mean value, 
high initial variance, and low current weight. The value of w/c? is used to rank the Gaussians. This value will 
increase as the distribution obtains more evidence and the variance decreases. 


3. GMM FOR BACKGROUND SUBTRACTION ISSUES 

GMM is able to establish multiple distributions rather than clearly modelling the pixel values as 
one particular type of distribution [25]. GMM has become a classical parametric model for moving de- 
tection and is classified as background and foreground due to its effectiveness and robustness over diverse 
distribution [26]. It is possible to have a static or dynamic background. It is straightforward to recognize ob- 
jects against a static background. When there is motion in the background, the object detection system that 
is needed is quite complicated. The object identification system needs to be able to accommodate dynamic 
backgrounds while maintaining satisfactory performance in real time. In the next subsections, we will describe 
various different background scenarios and the solutions for these issues using a GMM-based object detection 
system. 


3.1. Image or video noise 

The image size of motion detection can vary due to the effects of camera imaging characteristics, 
and the noise can easily influence the findings of motion detection. Because of this, the precision of the 
identification of moving objects is easily compromised [27], [28]. The presence of noise in an image will 
not only have an impact on how the image appears to the human eye, but it will also have an impact on the 
following processing of the image, such as the extraction of image features, the categorization, and recognition 
of images, and so on. Therefore, before processing the image, it is required to do denoising processing on the 
obtained image. These issues will enhance the image quality and make it easier to process the image once it 
has been acquired [26]. 
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Chen and Ellis used a multi-dimensional Gaussian kernel density transform (MDGKT) pre- 
processor to reduce noise in the spectral, temporal, and spatial domains. Each spectral component is smoothed 
spatially and temporally using a multivariate kernel that can be thought of as the product of two radially sym- 
metric kernels. The Euclidean metric enables a single bandwidth setting for each domain [30], (31), 
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Ja 
where the spatial component, denoted by x® , and the temporal component marked by x” , make up the feature 
vector Z. gq and gp are the bandwidths of the kernel, and M is the constant of associated normalization. This 
MDGKT plays a vital role as a pre-processor to enhance the GMM’s reliability. It is possible to control the 
size of the kernel with nothing more than a pair of bandwidth parameters (g,, gy), which in turn determines the 
time interval and the resolution of the GMM. This adjustment significantly affects the background subtraction’s 
efficiency and precision without compromising spatial flexibility. 

Kalti and Mahjoub assumes the incorporation of a fuzzy distance into both the expectation- 
maximization (EM) and adaptive distance-based fuzzy-C-means (ADFCM) algorithms. In this research, the 
characterization of pixels is based on two features. The first feature explains the inherent attributes of the pixel, 
and the second feature defines the entire neighbourhoods of the pixel. After that, the classification is determined 
based on adaptive distance, which gives preference to either one of two attributes depending on where the pixel 
is located in the picture in terms of its spatial location. The acquired findings have demonstrated that their 
method performs considerably better than the conventional fuzzy-C-means (FCM) and EM, particularly with 
regard to the toughness of the face-to noise and the precision of the edges between areas. 

Zuo et al. improved the conventional GMM for noise interruption by proposing a new method 
considering the image block averaging method, wavelet semi-threshold, and adaptive background updating 
method. At the step of background modelling, the video frame is blocked to enable computation and enhance 
the speed at which modelling is performed. The model of the background is rebuilt by employing the method 
of image block averaging. The following step involves utilizing wavelet semi-threshold function denoising in 
conjunction with mathematical morphology closed operation. The noise problem is successfully eliminated, 
and detection performance is improved in the detection stage of moving targets. During background updating 
phase, the method of adaptive background updating is used to bring the background update to produce more 
accurate detection results. The new approach in this work is both subjectively and objectively preferable to the 
conventional GMM, which validates the efficacy of the system and demonstrates its flexibility. 

Wei and Zheng calculate the L2 norm between the GMM corresponding to the two pixels to 
determine the degree of similarity between the two sets of pixels. The grayscale of the pixel and the abundance 
of features in the local region of the picture are both represented by the GMM of the image pixels in the image 
area. Measurements of the pixel grey intensity and the variety of information in the immediate region of the 
image can be made with more precision if they are based on the difference between individual pixels. Because 
of the similarities, the performance of the model used to denoise the image is improved, and the image’s detailed 
information is preserved. 

Luo et al. used a method for detecting motion that considers the variation in the spatial image 
threshold. The researcher calculates the projected size of motion in the image regions by establishing the 
mapping relationship between the geometric features of motion in the image regions and the reasonable level 
circumscribed rectangle (BLOB) of motion in the geographic space. This method is able to set an adaptive 
threshold for each motion in order to remove unwanted noise during the process of motion detection. 


3.2. Sudden illumination change 

It is generally accepted that background models can accommodate slow but steady alterations in how 
the environment appears. For instance, the amount of light in outdoor environments shifts throughout the day. 
Variations in the scene’s illumination can also occur all of a sudden. The sudden turning on or off the light in 
an interior space is one example of this shift that can take place. These issues can also occur outside scenarios, 
which as a sudden shift from cloudy to sunlight. The amount of illumination significantly impacts how the 
background appears and can lead to the detection of false-positive [34]. 

Chen and Ellis proposed a global illumination of background model adaption and an online 
dynamical learning rate in order to deal with this challenge. The researchers devised a revolutionary approach 
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using a revised adaptive strategy in the iterative Zivkovic-Heijden GMM (ZHGMM) learning procedure. In 
this step, they introduce the global illumination change of the median of quotient (MofQ) factor h among the 
previously learned background and the binary input picture. The MofQ global illuminating altering factor i, 
between the current picture and the reference picture 7, is described as follows for all of the pixels that make 
up a set S, 


h = medianges (=) (9) 


r,s 


the mixture model introduces a counter denoted by the letter c for each Gaussian component. The counter c 
keeps track of how many data sets have been sent to the Gaussian estimating algorithms, while the factor h 
monitors how the global illumination varies over time. This adjustment makes a significant improvement to the 
convergence as well as the background subtraction accuracy while maintaining the same degree of temporal 
flexibility. This method is accomplished by recursively adding a modified input adaptive schedule into an 
existing filtering system. When applied in sudden illumination changes, the performance is noticeably better 
than previous approaches. 

Martins et al. developed a novel classification mechanism that blends colour space discriminating 
skills with hysteresis and a dynamic learning rate to update the background model with sudden illumination 
change. Each channel element L*, a*, b* is studied on its own, and the decisions obtained from each are merged 
using the AND rule, which produces good results compared to those obtained from a conclusion based on ma- 
jority voting. To prevent noisy pixels, the colour distance of which is very significant to the decision threshold, 
from inadvertently altering the classification, a hysteresis method has been implemented. The dynamic back- 
ground learning rate denoted by apga, depends on the number of Gaussians present in the mixture, marked 
by M, such that apnpg = M - aga, where is a fixed minimum value for the learning rate of the background 
if the pixel is categorized as background. This method ensures that the model adaptation is performed more 
quickly in dynamic regions and more slowly in static regions. In contrast, if the pixel categorization shifts from 
foreground to background, a higher learning rate ay gg is applied. This method encourages rapid adaptation 
when the background is shown, which helps prevent the appearance of phantom pictures. 

Agrawal and Natu developed GMM with BLOB analysis of the interconnected parts, including 
labelling and morphological operations, to increase the accuracy of foreground detection. The suggested model 
can be broken down into two stages: the training phase, which is responsible for producing a referential image, 
and the testing phase, which is in charge of producing a binary mask. This model determines the difference 
between the frame of reference BMG(«, y) and the current, and then it applies a threshold value in order to 
extract the region of interest. When using this approach to construct the foreground model, a threshold value 
determined using the standard deviation was selected for each pixel. Compared to other methods, the results 
demonstrate that the concept of integrating GMM with blob analysis and morphological operation obtained a 
lower number of incorrectly classified pixels. 

Su presented a GMM with data model optimization to adapt light transitions. Calculating the 
gradient picture of the stream is the first step in the approach. The scar operator is used for this calculation. 
Then, integrate RGB and gradients and use figurative ways to eliminate noisy movement areas and combine 
those which remain. In order to reduce the likelihood of making an incorrect diagnosis, a comparison is made 
between the two models’ outputs to arrive at a final make-up area. Ultimately, they conducted the assessment 
and comparison using three separate image streams. The results reveal this method increases the accuracy of 
the detection process by minimizing the occurrence of erroneous detection areas caused by sudden illumination 
changes. 


3.3. Shadow 

Foreground items frequently have shaded areas due to changing light, which typically affects the 
segmentation of foreground items and the execution of subsequent modules of an algorithm that models the 
background. More specifically, there is a substantial difference in the lighting, but just a minor variation in 
the colour, in a darkened area. A pixel is considered a component of the shadow in the scene if it is a part 
of the background model that has been made darker by a shadow produced by another object in the scene. 
Therefore, a reliable method should include this technique to eliminate shadows cast by the foreground regions 
or disregard shadows that aren’t relevant to the problem at issue [39]. 
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Yadav and Xiaogang presented the innovative hybrid method relying on the GMM, subtraction 
of background, hue-saturation-value (HSV) colour model, feature extraction, and neural networks. In order 
to provide a clean foreground, the shadows cast by moving objects are identified, and then eliminated using 
the HSV colour model, and morphological operations are carried out. After the algorithms indicate that the 
detection is finished, the background is modified to conform to the dynamic background. Following the iden- 
tification of objects, the shape properties of those objects are retrieved using Hu’s seven-moment invariants of 
the training samples of the image data. These recovered shape features are then fed into the back propagation 
neural network (BPNN) during the training process. The system could erase the shadow’s influence and ac- 
curately detect motion. The findings of the experiments have shown that the suggested approach has strong 
resilience as well as real-time performance in realistic environments. 

Jin et al. presented an Improved GMM-based automatic segmentation method (IGASM) by im- 
proving the approach for updating the background so that it can more effectively segment the floats on the sea 
surface. Following the mapping of the GMM’s findings into an HSV colour space, a light-shadow classifier 
function is implemented to address the problems associated with shadows. After that, a morphological method 
is applied in order to refine the foregrounds that were previously acquired. In the end, the graph cuts technique 
is utilized to optimize the segmentation outcomes based on the spatial information in the video images. It is 
possible to detect stationary floats by improving the updating approach successfully. In order to solve the is- 
sues that are brought about by shadows in the segmentation results, a shadow discriminant function is utilized. 
Smoothing out the contour is accomplished through the open operation and the graph cuts algorithm. As a re- 
sult, the accuracy of the segmentation results can be increased even more. The results of the experiments reveal 
that this method demonstrates a higher efficiency in the surface floats segmentation on the water, particularly 
in situations where there is a considerable shift in light. Still, the surface floats on the water do not move. 

Lin and Chen provide a method for the detection of moving objects that are based on GMM and 
visual saliency maps. This method can eliminate the disruption caused by the shadow situation and accomplish 
stable detection of moving objects. The researcher uses the GMM approach to construct models for the video 
sequences and then obtains the rough foreground objects. The foreground, however, has a significant amount 
of incorrect detection, and as a result, they cannot adequately extract the moving objects. In the second step of 
the process, which is refining the crude foreground objects, make use of the vision saliency to achieve reliable 
detection results. Following the conversion of each image frame to the L*, a*, b* colour space, the L*, a*, b* 
channels are smoothed using a Gaussian filter in order to remove small texture features as well as noise. Then, 
estimate the saliency maps for each channel and linearly merge those maps to generate the final saliency map. 
In the end, the final saliency maps and the foregrounds are combined in order to produce the moving objects. 
The shadow issue can be successfully addressed and resolved using this method. 


3.4. Multimodal background 

A multimodal situation is created whenever there is motion occurring in the background. The scenery 
in the background might have some motion, such as a fountain, the movement of clouds, the swaying of 
nearby trees, and a wave in the water. This movement can be regular or irregular at various intervals. The 
traditional GMM is capable of handling multimodal backgrounds robustly. However, the parameter I< is fixed 
experimentally, and its value remains unchanged. This method is not ideal regarding the time required for 
detection, and computation [24]. An effective GMM improvement should be able to detect regular, or irregular 
motion [39]. 

Ou et al. proposed an adaptive GMM (AGMM) with BPNN to extract the foreground objects 
in multimodal background conditions. In most cases, AGMM is employed to accomplish the twin goals of 
simplifying the algorithm and enhancing its precision. All image pixels can fit with the hybrid model if more 
single-Gaussian models exist. The neural network can figure out the statistical parameters of the image’s noise, 
the model can alter the noise very well, and the foreground objects can be preserved entirely. This method 
solves the problem of defective foreground objects due to morphological processing. It eliminates the need 
for the model structure to make the trade between foreground objects and the noise. Because of this, they are 
implementing an adaptive version of the GMM supported by a neural network that can improve the robustness 
and performance of the entire system. 

Zuo et al. proposed a GMM-based technique for moving picture target recognition that over- 
comes multimodal backgrounds. Compared to other methods, this method’s main advantage is that it eliminates 
disturbance from dynamic backgrounds and improves algorithm detection. The image sequence is initially used 
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to divide the image of the video sequence. This method is followed by replacing each pixel value with the over- 
all average of each pixel of the image block in the background modelling step. Then, the image block of 
the GMM mean approach is applied. The combination of mathematical morphology and a semisoft thresh- 
old function is employed in this article to remove noise from an image of the foreground detection during the 
process of detecting moving targets. The soft and hard threshold methods have distinct disadvantages. The 
researcher presents a revised semisoft threshold function to overcome the limitations of both the hard and soft 
threshold functions. The conventional GMM approach cannot update the background in real time, resulting 
in a ghosting and poor moving target identification accuracy. The adaptive background update algorithm uses 
the existing frame of image detection and the background model to tackle the problem. This quantitative and 
qualitative approach is preferable to the method that substantiates the claims made on the system’s efficiency 
and adaptability. 


3.5. Camouflage 

The camouflage means the foreground may contain an element with a colour identical to the back- 
ground. It creates confusion during the process of detection and makes it difficult to determine whether some- 
thing is in front of the background. Even while the traditional GMM has fine-tuned crucial parameters, it still 
tends to create more false alarms. 

Zhang et al. use a combination of foreground matching with a measure of short-term stability for 
camouflage. The priority is given to matching probable foreground in incoming pixels by foreground models 
that have been constructed and updated using the foreground pixels that have been detected. During this time, 
the stability at the pixel level is assessed to ensure that an integrated foreground will be identified when a 
dynamic foreground procedure is being carried out. Suppose the currently-used pixel value does not match any 
pre-existing background models. In that case, the foreground model will be constructed with the currently-used 
pixel value acting as the model’s mean and an enormous value working as the model’s variance. Foreground 
models are given precedence over background models by this approach. Foreground models always try to 
match the pixels that come after them, which helps to lessen the likelihood that foreground pixels will not 
accurately represent the background models. Compute the short-term stability by utilizing the previous pixel 
only if the existing foreground model does not match the next pixel can be written as (10) 


2 
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where / are the total frame numbers representing the amount of time that has passed. The GMM’s tolerance to 
camouflage at a slow pace is significantly improved by the combination of foreground matching and short-term 
stability measures. 

Lima et al. proposed a method for estimating a threshold for each region by feedback step. 
Spatial analysis is used to determine a threshold, which is then used to classify the pixel. In order to correct 
some classification mistakes, the segmentation is filtered before the threshold estimation is performed. This 
filtering technique removes disturbances and combines the entire area into one cohesive whole. During the 
feedback stage, the fixes are used to estimate the thresholds for the subsequent iteration. Since the filtering 
stage only corrects foreground mistakes, the improvement is concentrated on the vehicle areas. This strategy 
that has been recommended promotes the segmentation of regions that have already been segmented. As a 
result, the estimation of the threshold is comparable to a first-order Markov chain. This technique lowers the 
amount of error on subsequent iterations when applied to fixed areas. 


3.6. Slow moving object 

Motions with unique patterns, such as moving slowly or staying, are significant for security protec- 
tion since they make detection exceedingly difficult [45]-[49]. In addition, these kinds of motions make it 
more challenging to identify perceived risks. In the actual world, many things move at a slower rate. When 
dealing with the kind of objects, the majority of the currently available GMM-based algorithms typically ex- 
tract fragmented bodies. This results in a significant reduction in detection precision. The traditional GMM 
algorithm is predicated on the idea that the background is more likely to be seen than any particular fore- 
ground. Because of this assumption, there is a chance that a long-observed object that moves slowly would be 
misidentified as background. This might lead to significant complications. It is not appropriate to include a 
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slow-moving object in the background until after the object has stopped moving altogether and has been still for 
a sufficient time [43]. 

Zhang et al. use the foreground matching technique to detect slow-moving objects. GMM is 
used for the foreground and is continuously updated using newly identified foreground pixels. It takes into 
explicit consideration the spatial continuity of the objects that are moving. The foreground model is priori- 
tized to fit the pixels, and pixels that provide the foreground models are explicitly designated as foreground. 
This is done to prevent the misleading inclusion of long-term foreground into the background. The results 
show progress has been made in improving the performance of the identification of slow-moving objects in 
complicated circumstances. 

Fu et al. studied an initial Gaussian background model (IGBM) based on an extended Kalman 
filter to improve GMM performance for the slow-moving object. In order to find a solution to this issue, the 
IGBM is not only incorporated with the GMM but also adds an extended Kalman filter [51-53] based tracker 
into the system whenever the grey value of this static object is comparable to the grey background value. 
Doing things this way can keep the moving item in its static condition indefinitely. Meanwhile, the tracker 
implemented in the system can reliably determine whether the grey of a stationary object is comparable to the 
grey background value. This research assumes a total of Av single Gaussian distributions in the GMM. Hence, 
they can construct a new background model using these initial Gaussian distributions, which are IGBM. Due 
to the fact that it is made up of complete single Gaussian distributions rather than the initial GMM, it can keep 
the most abundant information from the model that was used to create the original background. The results 
indicate that the EGMM approach can efficiently address the challenge of recognizing moving objects that stop 
and start intermittently. 

Zhang et al. presented the GMM with confidence measurement (GMMCM) as a potential solution 
to the issue that the background subtraction model is susceptible to being easily contaminated by vehicles either 
moving slowly or temporarily stopped. The confidence measurement (CM) is then applied to each background 
model pixel to quantify each background pixel’s current trust values. In order to avoid contamination of the 
background model in complicated urban traffic scenes, a technique for maintaining a balance between the 
dynamic changes in the brightness and the background, which makes use of a self-adaptive learning rate to 
keep the background model, has been developed. The result reveals that the proposed GMMCM does a great 
job of coping with cars that are going slowly or temporarily stopped. 


3.7. Bootstrapping 

Background initialization for bootstrapping video sequences is a technique that is frequently employed 
in intelligent video surveillance systems for the purpose of monitoring crowded public spaces. Background 
removal approaches have recently been concentrating on background initialization for bootstrapping video 
sequences. This is because a training duration devoid of foreground items is not always accessible in congested 
contexts [55], [56]. To put it another way, the definition of background initialization for bootstrapping video 
sequences is to estimate a background frame that does not contain any foreground items, given a video sequence 
that was captured by a stationary camera and in which the background is occluded by some foreground objects 
in each frame of the video sequence [57], [58]. This circumstance occurs when the objects in the foreground 
constantly occupy the background. In a situation like this, there are two conceivable outcomes. In the first 
scenario, the background is excessively active, and no single frame will be available that is devoid of any 
objects. The second scenario has a crowded background, but there are still some beginning frames that do not 
include any objects. 

Amintoosi et al. proposed the identification of the background through the use of the GMM 
with the QR decomposition method in linear algebra. R-values derived via QR decomposition can be used 
to deconstruct a given system to reflect the level of relevance possessed by each system’s components that 
have been decomposed. After dividing the image into several smaller blocks, the researchers look at each to 
determine which makes a minor significant contribution to the overall picture and then choose those blocks. 
The simulation findings indicate that the background detection performance is superior to other methods. 

Harville et al. used a method for modelling the background that uses per-pixel, time-adaptive 
GMM in the combined input space of depth and luminance-invariant colour. In this method, the GMM is 
used to model the background. This combination is already novel, but the researchers make it even more so by 
introducing the concepts of modulating the background model learning rate based on scene activity and making 
colour-based segmentation criteria dependent on depth observations. Both of these ideas serve to improve the 


Int J Artif Intell, Vol. 12, No. 3, September 2023: 1007-1018 


Int J Artif Intell ISSN: 2252-8938 i) 1015 
original combination further. These results demonstrate that the method has significantly larger robustness 
to troublesome phenomena than the previous state-of-the-art without sacrificing real-time performance. This 
makes it well-suited for a wide range of practical applications in video event detection and recognition. Table 


[]summarizes all the background challenges in motion detection using GMM approaches. 


Table 1. Summary of background subtraction challenges in motion detection using GMM 


No. Background issues Modification of GMM Authors 

1 Image or video noise 1.MOD-AT based on adaptive threshold. Luo et al. 
2.Calculating the L2 Norm. Wei and Zheng 
3.Image block averaging method, wavelet semi- Zuo et al. 
threshold, and adaptive background updating method. 
4.Fuzzy distance in EM and FCM algorithm. Kalti and Mahjoub 
5.A multi-dimensional Gaussian Kernal density Chen and Ellis 
transform. 

2 Sudden illumination change —_1.Online dynamic learning. Chen and Ellis 
2.Colour space discrimination capabilities with Martins et al. 
hysteresis and dynamic learning rate. 
3.Blob analysis and morphology. Agrawal and Natu 
4.Data model optimization. Su 

3 Shadow 1.HSV colour model features extraction and neural Yadav and Xiaogang 
network. 
2.Background updating strategy. Jin et al. 
3.Visual saliency map. Lin and Chen 

4 Multimodal background 1.AGMM and BPNN hybrid method. Ou et al. 
2.Image block average method, wavelet semi- Zuo et al. 
threshold, adaptive background updating method. 

5 Camouflage 1.Foreground matching and short-term stability mea- Zhang et al. 
sure. 
2.The threshold for each region by feedback step. Lima et al. 

6 Slow moving object 1.Foreground matching. Zhang et al. 
2.Initial GMM and Extend Kalman Filter. Fu et al. 
3.Confidence measurement. Zhang et al. 

7 Bootstrapping 1.QR decomposition. Amintoosi et al. 


2.Learning rate adaptation based on scene activity. 


Harville et al. 


4. CONCLUSION 

The goal of this study is to have a comprehensive assessment of the numerous GMM methods that have 
been developed to address various background issues. In addition to that, it offers a concise description of GMM 
methods. Image or video noise, sudden illumination change, shadow, multimodal background, camouflage, 
bootstrapping and slow-moving object are some of the background difficulties that various GMM versions can 
address. Image and video noise are the difficulties that receive significant attention from researchers, while 
camouflage receives less of their focus. Different variants of GMM are capable of simultaneously addressing 
various challenges. This study assists researchers in selecting the correct version of GMM for their applications 
based on the study’s findings. Moreover, GMM techniques, including comprehensive bibliography material, 
can provide helpful insights into this critical background topic and motivate for future research. 
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